PPDM: Output Privacy
...other than differential privacy, including: statistical disclosure control, query restriction/auditting, etc.
SoK
Nabil R. Adam, John C. Worthmann
cited by ~1300
nrryuya.icon > Good summary and comparison of naive, intuitive approaches
https://gyazo.com/3b97ab8c4615c07e415ee41a70208d79
Query restriction approach
Query-Set-Size Control -> too easy to compromise
Query-Set-Overlap Control -> not practical
Query Auditting -> high CPU time and storage requirements (at the time)
Partitioning (similar to k-anonymization), cell suppression
Data perturbation -> suffer from a bias problem
Probablity-distribution category -> partial disclosure is easy for large SDBs
Fixed-data perturbation category -> limited to one confidential attribute.
Output perturbation
Random-Sample Queries
Varying-Output Perturbation
Rounding (Systematic/Random/Controlled) -> not effective (but can be combined with other methods)
V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati
cited by ~100
Picking up sections about output privacy.
V. Ciriani, S. De Capitani di Vimercati, S. Foresti, P. Samarati
Section 5.7 Mine-and-Anonymize
Enforcing k-Anonymity on Association Rules/Decision Trees
Example of an inference channel violating k-anonymity and the solution by modifying the support of involved itemsets
https://gyazo.com/647cba10676f4d0f57c53d9ed21cca2d
Chapter 8: A Survey of Quantification of Privacy Preserving Data Mining Algorithms (pdf) Elisa Bertino, Dan Lin, Wei Jiang
Section 8.2.2 Result Privacy
Cites "When do data mining results violate privacy?" (ACM SIGKDD'04)
Inference channel in Bayesian classifiers
Section 8.4.2 Quality of the Data Mining Results
Introduces the metrics for classification, association rules
Chapter 9: A Survey of Utility-based Privacy-Preserving Data Transformation Methods
Ming Hua, Jian Pei
https://gyazo.com/f9980f0c0e4508317609bc2838279eae
Section 9.3 Utility-Based Anonymization Using Local Recoding
Improving the query answering accuracy on anonymized tables
Utility measure: normalized certainty penalty (NCP) i.e., normalized interval size after generalization
nrryuya.icon > not tailored to the actual queries
Algorithm: Bottom-up/top-down k-anonymization
Section 9.4 The Utility-based Privacy Preserving Methods in Classification Problems
The top-down specialization (TDS): generalize tuples in a table, such that tuples in the same equivalence (i.e., sharing the same values on QI) class are as pure as possible with respect to class labels.
the information gain is measured by the entropy reduction on the affected equivalence classes
The progressive disclosure algorithm (PDA): to eliminate sensitive inferences, it suppresses some attribute values so that the confidence of each inference rule is controlled lower than a user defined threshold.
the information gain is defined as the entropy reduction on the tuples involving the disclosed value
Section 9.5 Anonymized Marginal: Injecting Utility into Anonymized Data Sets
A marginal (i.e., SELECT count(*) from t GROUP BY A) is anonymized if some of its attribute values are generalized.
A utility measure is defined as the difference between the distribution of the original data and that of the anonymized data (KL-divergence).
Chapter 10: Mining Association Rules under Privacy Constraints
Jayant R. Haritsa
Privacy Metric: Average Privacy, Worst-case Privacy, Re-interrogated Privacy, Amplification Privacy
Accuracy Metric: Support Error, Identity Error
Literature 1: Input data privacy (MASK, Cut-and-Paste Operator, Algebraic-distortion)
FRAPP: a generalized matrix-theoretic framework for the design of random perturbation Literature 2: Output rule privacy (support/confidence-based hiding, data blocking)
See also: Chapter 11
Experimental result
https://gyazo.com/987d1f412afb576d2581e6539771ce91
Chapter 11: A Survey of Association Rule Hiding Methods for Privacy
Vassilios S. Verykios, Aris Gkoulalas-Divanis
Association rule hiding, i.e., it is not the data but the sensitive rules that create a breach to privacy
Taxonomy: 1. support/confidence-based, 2. distortion/blocking, 3. single/multiple rule, 4. heuristic/exact, 5. user specified sensitive rules
Chapter 12: A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries
Stephen E. Fienberg, Aleksandra B. Slavkovic
On association rule mining:
Support is a marginal table, and confidence is a conditional table, both corresponding to a subset of variables making up the full table.
Identification of cells in contingency table (Theorem 12.1)
Trivially, for bivariate tables, the joint probability distribution is the support, and thus along with the knowledge of sample size n, an association rule will reveal all cell counts. The above results also imply that releasing the confidence of a rule along with some marginal information, again will identify all entries in a table
Linear programming (LP) relaxation bounds, Integer programming (IP) relaxation bounds
https://gyazo.com/922d762349b6fa43d29af0de093f968e
Chapter 16: Private Data Analysis via Output Perturbation
Kobbi Nissim
ε-differential privacy
Application: Sum, Mean, Covariance, Histograms, Subset Sum, Distance to Property, k-Means Clustering, SVD, PCA, Learning in the Statistical Queries
Beyond the Basics: Instance Based Noise and Smooth Sensitivity, The Sample-Aggregate Framework, A General Sanitization Mechanism
Chapter 17: A Survey of Query Auditing Techniques for Data Privacy (pdf) Shubha U. Nabar (Stanford University) et al.
Definition of privacy notion: Full disclosure, Partial disclosure ($ \lambda-safe), Perfect Privacy
A General Approach for Constructing Private Randomized Auditors
https://gyazo.com/674540dd6c9da26ec1ba4a61162d6361
Implementations
Templ M, Kowarik A, Meindl B
sdcMicro: R-package to anonymize microdata. Most functionalities of the package are also available via an interactive shiny-based graphical user interface. (docs) "SOFTWARE TOOLS FOR STATISTICAL DISCLOSURE CONTROL" Slides Aastha Mehta, Eslam Elnikety, Katura Harvey, Deepak Garg, and Peter Druschel.
Qapla provides an alternate approach to policy enforcement that neither depends on application correctness, nor on specialized database support.
In Qapla, policies are specific to rows and columns and may additionally refer to the querier’s identity and time, are specified in SQL, and stored in the database itself.
Related
Femi Olumofin, Ian Goldberg
Example: Domain name registration